Training State-of-the-Art Portuguese POS Taggers without Handcrafted Features
نویسندگان
چکیده
Part-of-speech (POS) tagging for morphologically rich languages normally requires the use of handcrafted features that encapsulate clues about the language’s morphology. In this work, we tackle Portuguese POS tagging using a deep neural network that employs a convolutional layer to learn character-level representation of words. We apply the network to three different corpora: the original Mac-Morpho corpus; a revised version of the Mac-Morpho corpus; and the Tycho Brahe corpus. Using the proposed approach, while avoiding the use of any handcrafted feature, we produce state-of-the-art POS taggers for the three corpora: 97.47% accuracy on the Mac-Morpho corpus; 97.31% accuracy on the revised Mac-Morpho corpus; and 97.17% accuracy on the Tycho Brahe corpus. These results represent an error reduction of 12.2%, 23.6% and 15.8%, respectively, on the best previous known result for each corpus.
منابع مشابه
Evaluating Solutions for the Rapid Development of State-of-the-Art POS Taggers for Portuguese
متن کامل
Boosting Named Entity Recognition with Neural Character Embeddings
Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level represent...
متن کاملA POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP Tools Contest Entry from Surukam
Building Part-of-Speech (POS) taggers for code-mixed Indian languages is a particularly challenging problem in computational linguistics due to a dearth of accurately annotated training corpora. ICON, as part of its NLP tools contest has organized this challenge as a shared task for the second consecutive year to improve the state-of-the-art. This paper describes the POS tagger built at Surukam...
متن کاملTagMiner: A Semisupervised Associative POS Tagger Effective for Resource Poor Languages
We present here, TagMiner, a data mining approach for part-of-speech (POS) tagging, an important Natural language processing (NLP) classification task. It is a semi-supervised associative classification method for POS tagging. Existing methods for building POS taggers require extensive domain and linguistic knowledge and resources. Our method uses combination of a small POS tagged corpus and a ...
متن کاملChoosing a Spanish Part-of-Speech tagger for a lexically sensitive task
In this article, four Part-of-Speech (PoS) taggers for Spanish are compared. The evaluation has been carried out without prior training or tuning of the PoS taggers. To allow for a comparison across PoS taggers, their tagsets have been mapped to the universal PoS tagset (Petrov, Das, and McDonald, 2012). The PoS taggers have also been compared as regards the information they provide and how the...
متن کامل